Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hints #203

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Type hints #203

wants to merge 6 commits into from

Conversation

LecrisUT
Copy link
Contributor

@LecrisUT LecrisUT commented Aug 9, 2023

I don't like that the data dict type is an Any, but this should work

  • Tree
  • Context
  • filter
  • Other internal (probably in a different PR)

@happz
Copy link
Collaborator

happz commented Aug 9, 2023

One quick note: | is supported from Python 3.10. I don’t know about other users, but at least tmt would still support Python 3.9. Therefore you might need to use Union or wrap unions with | with a string (”int | str”) to hide them from Python 3.9 interpreter.

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Aug 9, 2023

  • Rhel8 still uses 3.6, which might be a pain to make it compatible
  • Rhel9 error is about |, so yeah should change that to Optional and Union
  • I am not sure if I got the Context typing correct
  • Should add mypy to the CI in this PR

@happz
Copy link
Collaborator

happz commented Aug 9, 2023

Should add mypy to the CI in this PR

Absolutely, you can copy the section from tmt's pre-commit configuration, it served there well.

fmf/context.py Outdated Show resolved Hide resolved
fmf/base.py Outdated Show resolved Hide resolved
fmf/base.py Outdated Show resolved Hide resolved
# JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None
DataType: TypeAlias = Union[RawDataType, ListDataType, DictDataType]
TreeData: TypeAlias = dict[str, DataType]
TreeDataPath: TypeAlias = Union[TreeData, str] # Either TreeData or path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming can be confusing here between TreeData, TreeDataPath and DataType. Any suggestions?

@LecrisUT
Copy link
Contributor Author

MyPy is up and there are issues to resolve

@LecrisUT
Copy link
Contributor Author

All tests are green here right now. Any review please?

fmf/base.py Outdated Show resolved Hide resolved
fmf/base.py Outdated
return
# Use the special merge for merging dictionaries
data_val = data[key]
if type(data_val) == type(value) == dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the original code, but this is not the pythonic way of checking object classes. if isinstance(data_val, dict) and isinstance(value, dict) will work better, and supportes dict subclasses - there's no reason to not allow e.g. OrderedDict to be the source for a node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about str, do we hard-check str?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which condition? I see elif type(data[key]) == type(value) == type(""): a couple of lines below, here I would also use isinstance: elif isinstance(data[key], str) and isinstance(value, str) - is it this one you're asking about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in those kind of situations. I think it is better to hard-check for str type in those cases, and implement a base type if we want other merge-able string subtypes. Dict one, could also get a similar treatment eventually, but I can see a benefit for changing the type there for the time being (i.e. before converting all raw dictionary types to attrs dataclasses that would include the adjust and other fmf directives).

But on that note, do you have a small snippet to confirm that isinstance works with OrderedDict and Mapping types? I don't think it would work with the latter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in those kind of situations. I think it is better to hard-check for str type in those cases, and implement a base type if we want other merge-able string subtypes. Dict one, could also get a similar treatment eventually, but I can see a benefit for changing the type there for the time being (i.e. before converting all raw dictionary types to attrs dataclasses that would include the adjust and other fmf directives).

IUIC, a hard check would be type(value) == type(''), but what would be the benefit over isinstance(value, str)? I'm not sure we understand each other :)

But on that note, do you have a small snippet to confirm that isinstance works with OrderedDict and Mapping types? I don't think it would work with the latter.

isinstance(OrderedDict(), dict) is true, isinstance(AMappingSubclass(), dict) is not. isinstance(AMappingSubclass(), (dict, collections.abc.Mapping)) should cover this base class as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IUIC, a hard check would be type(value) == type(''), but what would be the benefit over isinstance(value, str)? I'm not sure we understand each other :)

Yes, though I prefer type(value) == str. That would be to enforce we are only checking for str. Mostly as a sanity check since I don't think we should be calling _mergeplus() as data[key] += value in those cases, but instead check for specialized overloads.

isinstance(OrderedDict(), dict) is true, isinstance(AMappingSubclass(), dict) is not. isinstance(AMappingSubclass(), (dict, collections.abc.Mapping)) should cover this base class as well.

Hmm, how about if I implement the general case for isinstance(value), but keep type(data_val) == dict for sanity check until we implement attrs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IUIC, a hard check would be type(value) == type(''), but what would be the benefit over isinstance(value, str)? I'm not sure we understand each other :)

Yes, though I prefer type(value) == str. That would be to enforce we are only checking for str. Mostly as a sanity check since I don't think we should be calling _mergeplus() as data[key] += value in those cases, but instead check for specialized overloads.

isinstance(OrderedDict(), dict) is true, isinstance(AMappingSubclass(), dict) is not. isinstance(AMappingSubclass(), (dict, collections.abc.Mapping)) should cover this base class as well.

Hmm, how about if I implement the general case for isinstance(value), but keep type(data_val) == dict for sanity check until we implement attrs?

But that's what I don't understand: why should we enforce str, or dict, exactly? :) Why bother? As long as value is an instance of str (or one of its subclasses), it is string-like enough for our use. We can measure its length, we can call its join(), and we can add it to another string with +=.

I see no benefit in sticking to the exact type only, TBH. From my POV, if isinstance(value, (dict, Mapping)) is true, then value is simply a dictionary, maybe in a trenchcoat, but it still can be iterated over. We perform no black magic with dict internals - we copy keys and values between dictionaries, and from time to time we call get(), that's all :) It's actually quite common, BTW, e.g. depending on the parser type, ruamel.yaml may materialize a YAML file as a pile of OrderedDict instances - by allowing dict only, we would disqualify otherwise perfectly fine dict-like objects. If it walks like a dict, if it quacks like a dict, then we can merge it with another dict-like object.

The same applies to list checks, right? We don't inspect list type internals, all we do is stuff like data[key] = [item for item in data[key] if item not in value] - as long as value is list-like, it should be absolutely acceptable.

I don't understand what issues would be solved by allowing the exact type only, or what issues would it prevent. On the other hand, checking for being list-like, dict-like, or string-like opens the door to a wider array of compatible input types.

Copy link
Contributor Author

@LecrisUT LecrisUT Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About data_val, it will always be a dict/list/str, etc., unless we open the yaml to beyond safe_loader, at which point ambiguity enters (e.g. the original type might no longer be a dict due to a yaml tag, but with fmf we want to add to the constructor). If we keep it strict on data_val, we would at least catch any issues if something else altered the yaml loader or anything else.

For value indeed they have the same interface so the operations will allow us to do the operations. It's just that we are assuming it is what the author of the dict-like, str-like object intends. I think I have an example of a possible confusion in this same PR.

Consider Tree. It has a dict-like structure through __getitem__, get(), etc., but how I have it implemented is that keys, values, etc. are wrappers to Tree.data (because it is easier to implement and because child is more natural to access using file-api), while additional child or nested items are gathered using __getitem__ and get(). This could create a confusion when using _merge_plus().

For the time being though and until there's a plugin system to add more specialized rules for this, it would be helpful to add the isinstance checks for value. But I'm not sure how to keep track or denote the places where we might want to change it to have more specialized rules. Due to type-hinting there are quite alot of isinstance that we would have to filter through.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider Tree. It has a dict-like structure through __getitem__, get(), etc., but how I have it implemented is that keys, values, etc. are wrappers to Tree.data (because it is easier to implement and because child is more natural to access using file-api), while additional child or nested items are gathered using __getitem__ and get(). This could create a confusion when using _merge_plus().

I think this example goes further than I wanted: yes, Tree might look similar to a dictionary, it does have some methods with similar semantics, but it's not a dict subclass. Therefore it would not pass the isinstance(value, dict) test, and our code would not try to combine it with other values.

My point was to allow subclasses of dict or list, e.g. ruamel.yaml.comments.CommentedMap, not classes that look like dict or list, i.e. class tree, not class API. To replace type(data_val) == type(value) == dict with isinstance(data_val, dict) and isinstance(value, dict) - this would not open the door to dict-like classes like Tree. If by any chance a Tree instance would be passed to _merge_plus(), I would expect none of these tests to match, and such a value would be reported as something fmf does not know how to merge with the existing tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about isinstance(value, (dict, Mapping))? I think it's useful for Tree to have type-hint of Mapping for further down the line, e.g. if it's used in a generic map input.

fmf/base.py Show resolved Hide resolved
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants